Libraries imported successfully.
Environment configured for regression analysis.
Local file not found. Retrieving from GitHub repository...
Data retrieved and cached locally.
Sample size: 6,319 listings
Variables: 13 columns
Price range: £10.00 - £74100.00
Dataset Overview:
   Observations: 6,319
   Variables: 13

Sample preview (first 5 observations):
Out[3]:
price accommodates bedrooms beds room_type property_type latitude longitude availability_365 minimum_nights maximum_nights number_of_reviews bathrooms
0 126.0 4 1.0 1.0 Entire home/apt Entire rental unit 51.514609 -0.136069 39 1 365 15 1.0
1 225.0 5 3.0 3.0 Entire home/apt Entire home 51.398840 -0.290510 315 2 70 21 1.5
2 2400.0 8 4.0 4.0 Entire home/apt Entire rental unit 51.500550 -0.017170 364 1 1125 0 3.0
3 150.0 4 2.0 2.0 Entire home/apt Entire condo 51.506070 -0.218960 273 2 120 38 2.0
4 180.0 6 4.0 2.0 Entire home/apt Entire home 51.441898 -0.195032 353 5 365 1 2.5
No description has been provided for this image
Note: 64 listings above £1214 excluded from chart for clarity
Average price: £220.48 per night
Median price: £135.00
Cheapest listing: £10.00
Most expensive listing: £74100.00
No description has been provided for this image
No description has been provided for this image
Note: 64 extreme outliers (>1214£) excluded for clarity
No description has been provided for this image
No description has been provided for this image
Note: Showing 99% of data (prices ≤ £1214)
Average prices by room type:
  Hotel room: £549.11/night
  Entire home/apt: £281.90/night
  Private room: £85.99/night
  Shared room: £38.84/night
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
Note: 64 extreme price outliers excluded for clarity
No description has been provided for this image
No description has been provided for this image
Note: 62 extreme price outliers excluded for clarity
No description has been provided for this image
No description has been provided for this image
Correlation Interpretation:
  r > 0.7: Strong positive association
  r > 0.3: Moderate positive association
  r < -0.3: Moderate negative association
  |r| < 0.3: Weak or no linear relationship
No description has been provided for this image
Geographic Insights:
  Central London (higher density) shows elevated pricing
  Price gradient visible from city center to periphery
  Yellow/light colors = Higher priced listings
  Purple/dark colors = Lower priced listings
No description has been provided for this image
No description has been provided for this image
Availability Statistics:
  Mean availability: 217 days/year
  Median availability: 248 days/year
  Fully available (365 days): 349 listings (5.5%)
  Not available (0 days): 81 listings (1.3%)

Business Insight: Bimodal distribution suggests full-time vs. occasional hosting strategies
No description has been provided for this image
No description has been provided for this image
Minimum Nights Statistics:
  Median minimum nights: 2
  1-night stays allowed: 2,336 listings (37.0%)
  Weekly minimum (7+ nights): 762 listings (12.1%)

Business Insight: Longer minimum stays often correlate with lower nightly rates (volume pricing strategy)
Outlier Detection (IQR Method):
  Q1 (25th percentile): £77.00
  Q3 (75th percentile): £223.50
  IQR: £146.50
  Lower bound: £-142.75
  Upper bound: £443.25

  Outliers detected: 457 listings (7.2%)
  Price range of outliers: £444.00 - £74100.00
Missing Values Summary:
   Column  Missing_Count  Missing_Percent
bathrooms             69             1.09
 bedrooms             13             0.21
     beds             13             0.21
Duplicates removed.
Observations after deduplication: 6,319
Duplicate rows found: 0
Total rows before check: 6,319
Logarithmic transformation applied.
Original price range: £10.00 - £74100.00
Transformed range: 2.40 - 11.21
Selected 3 continuous predictors:
  - accommodates
  - bedrooms
  - beds
Added 3 room type variables

Total features for model: 6
Median imputation applied:
  - bedrooms: 13 values imputed
  - beds: 13 values imputed
Analysis dataset prepared.
  Observations: 6,319
  Predictors: 6

Data types validation:
accommodates           int64
bedrooms             float64
beds                 float64
room_Hotel room         bool
room_Private room       bool
room_Shared room        bool
log_price            float64
dtype: object

First observations:
   accommodates  bedrooms  beds  room_Hotel room  room_Private room  \
0             4       1.0   1.0            False              False   
1             5       3.0   3.0            False              False   
2             8       4.0   4.0            False              False   
3             4       2.0   2.0            False              False   
4             6       4.0   2.0            False              False   

   room_Shared room  log_price  
0             False   4.844187  
1             False   5.420535  
2             False   7.783641  
3             False   5.017280  
4             False   5.198497  
Training data prepared.
  Training set: 6,319 observations (100% of data)

Data types in X_train:
accommodates           int64
bedrooms             float64
beds                 float64
room_Hotel room         bool
room_Private room       bool
room_Shared room        bool
dtype: object

Note: Model trained on entire dataset for maximum sample size.
BASELINE MODEL Results:
  Predictors: accommodates, bedrooms
  R² (Coefficient of Determination): 0.3511
  Adjusted R²: 0.3509
  Effect size: Medium (Cohen's f² = 0.541)
  RMSE (Root Mean Squared Error): 0.6474
  Adjusted for predictors: 35.1%

Model Performance:
  Explained variance: 35.1%
FULL MODEL Results:
  Number of predictors: 6
  R² (Coefficient of Determination): 0.5004
  Adjusted R²: 0.4999
  RMSE (Root Mean Squared Error): 0.5681

Model Performance:
  Explained variance: 50.0%
  Adjusted for predictors: 50.0%
  Effect size: Large (Cohen's f² = 1.002)

Model Comparison:
  Incremental variance explained: 14.9%
  ΔR² = 0.1493

Additional predictors provide improvement (ΔR² > 0).
Data prepared: 6319 observations, 6 features
Data types: [dtype('float64')]
================================================================================
STATSMODELS OLS REGRESSION RESULTS
================================================================================
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              log_price   R-squared:                       0.500
Model:                            OLS   Adj. R-squared:                  0.500
Method:                 Least Squares   F-statistic:                     1054.
Date:                Sun, 30 Nov 2025   Prob (F-statistic):               0.00
Time:                        18:38:59   Log-Likelihood:                -5392.9
No. Observations:                6319   AIC:                         1.080e+04
Df Residuals:                    6312   BIC:                         1.085e+04
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
=====================================================================================
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const                 4.6250      0.018    252.981      0.000       4.589       4.661
accommodates          0.1122      0.007     16.752      0.000       0.099       0.125
bedrooms              0.1644      0.012     13.828      0.000       0.141       0.188
beds                 -0.0523      0.009     -5.857      0.000      -0.070      -0.035
room_Hotel room       0.8721      0.190      4.595      0.000       0.500       1.244
room_Private room    -0.7310      0.018    -40.994      0.000      -0.766      -0.696
room_Shared room     -1.3692      0.134    -10.244      0.000      -1.631      -1.107
==============================================================================
Omnibus:                     2367.847   Durbin-Watson:                   2.009
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            16823.851
Skew:                           1.618   Prob(JB):                         0.00
Kurtosis:                      10.310   Cond. No.                         134.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

================================================================================
BUSINESS-FRIENDLY INTERPRETATION OF KEY STATISTICS
================================================================================
TOP 5 PREDICTORS BY ABSOLUTE COEFFICIENT MAGNITUDE:
==================================================
room_Shared room     β = -1.3692 (negative association)
room_Hotel room      β =  0.8721 (positive association)
room_Private room    β = -0.7310 (negative association)
bedrooms             β =  0.1644 (positive association)
accommodates         β =  0.1122 (positive association)
No description has been provided for this image
Green bars = positive coefficients
Red bars = negative coefficients
Variance Inflation Factor (VIF) Analysis:
==================================================
          Feature       VIF
     accommodates 11.604452
         bedrooms  9.018756
             beds  8.733639
room_Private room  1.140507
 room_Shared room  1.046276
  room_Hotel room  1.000697

Interpretation:
  VIF < 5: Low multicollinearity (acceptable)
  VIF 5-10: Moderate multicollinearity (caution)
  VIF > 10: High multicollinearity (problematic)

(!) WARNING: 1 predictor(s) exhibit high multicollinearity.
No description has been provided for this image
Observations proximate to diagonal indicate accurate predictions.
Deviation from diagonal represents prediction error.
No description has been provided for this image
No description has been provided for this image
Mean residual: 0.0000
Std deviation: 0.5681
============================================================
MODEL FIT STATISTICS
============================================================

Baseline Model (k=2):
  R² = 0.3511
  Adjusted R² = 0.3509
  RMSE = 0.6474

Full Model (k=6):
  R² = 0.5004
  Adjusted R² = 0.4999
  RMSE = 0.5681

Model Comparison:
  ΔR² = 0.1493
  ΔAdjusted R² = 0.1490

Conclusion:
  Full model justified: Adjusted R² improvement = 14.90%
Average error: 0.0000
Typical error size: 0.4100

Good residuals should be randomly scattered around zero!
No description has been provided for this image
No description has been provided for this image
Prediction Accuracy Summary:
  Exact category match: 55.8%
  Within one category: 97.8%

Diagonal = correct predictions (darker = more accurate)
Off-diagonal = misclassifications